Data Visualization

Lucas S. Macoris

This page was intentionally left blank

Outline

Coding Replications

For coding replications, whenever applicable, please follow this page or hover on the specific slides with containing coding chunks.

  1. Ensure that you have your session properly set-up according to the instructions outlined in the course webpage
  2. Along with the slides, this lecture will also contain a replication file, in .qmd format, containing a thorough discussion for all examples that have been showcased. This file, that will be posted on eClass®, can be downloaded and replicated on your side. To do that, download the file, open it up in RStudio, and render the Quarto document using the Render button (shortcut: Ctrl+Shift+K).
  3. At the end of this lecture, you will be prompted with a hands-on exercise to test your skills using the tools you’ve learned as you made your way through the slides. A suggested solution will be provided in the replication file.

Visualizing Financial Data

  • Part of your work as a Financial Analyst is to convey the information to a broader public:
    1. Often times, Fund managers do not use R, Python, and won’t be proof-reading your .xlsx file
    2. Investors, Journalists, or the general audience all alike may not have the set of skills to ingest and analyze data like you do
  • It is imperative to think about how you should structure your message to effectively communicate it to your audience’s needs

Introducing: the Grammar of Graphics

The Grammar of Graphics sets up the foundations that underlie the production of all types of charts, ranging from pie charts, bar charts, scatterplots, and many more. To that matter, the Grammar of Graphics presents a unique foundation for producing charts from quantitative information that are widely used in scientific journals, newspapers, statistical packages, and data visualization systems.

  • How can you implement these foundations to showcase your data analysis? Luckily, there is a way to do that in R: I introduce you to the wonderful world of ggplot2

Introducing ggplot2

  1. You start with a call to ggplot(), supplying the data and a aesthetic mapping (aes), like x and y axis, groupings, etc
  2. After that, you choose the geometry (geom), the shape of the visual elements contained in the visualization
  3. Finally, you add layers on top on the geometry (titles, annotations, etc) and customize your theme (font size, background color, etc)

Key Highlights

  1. It is, by and large, the richest and most widely used plotting ecosystem in the language
  2. ggplot2 has a rich ecosystem of extensions - ranging from annotations and interactive visualizations to specialized genomics - click here a community maintained list

ggplot2 foundations

  • We will illustrate the use of ggplot2 to replicate the Grammar of Graphics foundations using the FANG dataset, which is loaded together with your slides - if you prefer to do it direclty in R, hit the download button and load it using read_delim('FANG.txt')

  • To get ggplot2 in your session, either load tidyverse altogether of directly load the library:

#Load the tidyquant package
library(tidyquant)

#Option 1: load the tidyverse, which includes ggplot2
library(tidyverse)

#Option 2: load ggplot2 directly
library(ggplot2)
  • In what follows, we will be working with a step-by-step example of how to use ggplot2 for data visualizations

Step 1: the data

  • We will be using the FANG dataset, which contains basic stock information from popular U.S. techonology firms: Facebook (Meta), Amazon, Netflix, and Google (Alphabet)

  • The first step in using ggplot2 is to call your data dataframe and supply the aesthetic mapping, which we’ll refer to as aes

ggplot(data=your_data, aes(x= variable_1, y=variable_2, ...))
  1. The data argument refers to the dataset used
  2. The aes argument contains all the aesthetic mappings that will be used
  • Together, these constitute the backbone of your visualization: they tell ggplot2 what the raw information to be used and where it should be mapped!

Step 1: the data, practice

Listing 1: Use the newly created META dataset and call ggplot, mapping the date variable in the x axis, adjusted variable in the y axis, and symbol in the group aesthetic. The FANG dataset and ggplot2 have been already loaded for you. Even if you submit the wrong answer, a live-tutoring feature will provide you with a handful of tips to adjust your code and resubmit your solution.

Use the ggplot() function together with aes(x, y, group):

#Let's use Apple (META) adjusted prices META=FANG%>%filter(symbol=='META') #Use ggplot2 to map the aesthetics to the plot ggplot(META, aes(x=date,y=adjusted,group=symbol))
#Let's use Apple (META) adjusted prices
META=FANG%>%filter(symbol=='META')

#Use ggplot2 to map the aesthetics to the plot
ggplot(META, aes(x=date,y=adjusted,group=symbol))

Step 2: adding your geom

  • You probably thought you did something wrong when you saw an empty chart with the named axis, right? However, I can assure: you did great!

    1. It is all about the philosophy embedded in the Grammar of Graphics: you first provide the data and the aes(thetic) mapping to your data

    2. Now, ggplot knows exactly which information to select and where to place it. However, it is still agnostic about how to display it

  • We will now add a geometry layer - in short, a geom:

    1. You can add layers on top of ggplot object addition symbol (+)
    2. There are many types of potential geometries, to name a few: geom_point(), geom_col(), geom_line() - access here for a complete list
#Use ggplot2 to map the aesthetics to the plot
ggplot(META, aes(x=date,y=adjusted)) +
#Map a given geometry
geom_{yourgeomhere}()

Step 2: adding your geom, practice

Listing 2: In your ggplot object, try out the following geoms: geom_point(), geom_col(), and geom_line(). Which one do you think is the best for the task? The FANG dataset and ggplot2 have been already loaded for you. Even if you submit the wrong answer, a live-tutoring feature will provide you with a handful of tips to adjust your code and resubmit your solution.

In general, using geom_line() suits the best for time series

#Let's use Meta (META) adjusted prices META=FANG%>%filter(symbol=='META') #Use ggplot2 to map the aesthetics to the plot ggplot(META, aes(x=date,y=adjusted,group=symbol)) + geom_line()
#Let's use Meta (META) adjusted prices
META=FANG%>%filter(symbol=='META')

#Use ggplot2 to map the aesthetics to the plot
ggplot(META, aes(x=date,y=adjusted,group=symbol)) +
geom_line()

Step 3: be creative with additional layers

  • Your main chart is now all set:

    1. It contains the data and the necessary aes(thetic) mappings to the chart;
    2. It also contains a shape, or geom(metry), that was selected to display the data
  • The philosophy behind the Grammar of Graphics is now to add layers of information on top of the base chart using the + operator, like before

  • We will proceed by including several layers of information that will either add or modify the behavior of the chart, making it more appealing to our audience:

    1. Adding trend-lines using geom_smooth()
    2. Adding annotations and labels using annotation and labs
    3. Modifying the behavior of the scales using scale_y and scale_x
  • Try to sequentially add these layers and re-run the code to see how it reflects on the output!

Step 3: be creative with additional layers

Listing 3: In your previously created ggplot object, add a smoothed trend of adjusted prices using the geom_smooth(method='loess') geometry and adjust the labels of your axis, chart title, and subtitle. You can pass additional layers using the + operator. For changing the labels, you can use the labs(x='Your X Label',y='Your Y Label', title='Your Title', subtitle='Your subtitle') syntax. The x-axis should be called “Date,” y-axis should be called “Adjusted Prices”, the title should be called “META Prices Over Time”, and the subtitle should be called “Source: Yahoo! Finance”. The FANG dataset and ggplot2 have been already loaded for you. Even if you submit the wrong answer, a live-tutoring feature will provide you with a handful of tips to adjust your code and resubmit your solution.

You can call the geom_smooth() along with method='loess' to have a smoothed trend added on top of your chart, and customize your labels by calling the labs() argument. You can chain these operations on top of your chart using the + sign.

#Let's use Meta (META) adjusted prices META=FANG%>%filter(symbol=='META') #Use ggplot2 to map the aesthetics to the plot ggplot(META, aes(x=date,y=adjusted,group=symbol)) + geom_line()+ #Adding a trend geom_smooth(method='loess')+ #Adding Annotations labs(title='META adjusted prices', subtitle = 'Source: Yahoo! Finance', x = 'Year', y = 'Adjusted Prices')
#Let's use Meta (META) adjusted prices
META=FANG%>%filter(symbol=='META')

#Use ggplot2 to map the aesthetics to the plot
ggplot(META, aes(x=date,y=adjusted,group=symbol)) +
geom_line()+
#Adding a trend
geom_smooth(method='loess')+
#Adding Annotations
labs(title='META adjusted prices',
     subtitle = 'Source: Yahoo! Finance',
     x = 'Year',
     y = 'Adjusted Prices')

More annotations

  • Apart from simply changing the labels of your axis, titles and subtitles, you can also use ggplot2 to customize the appearance of your axis:

    1. The family of functions scale_x_{} apply a given structure to the x-axis - e.g, scale_x_date(),scale_x_continuous()
    2. The family of functions scale_y_{} apply a given structure to the y-axis - e.g, scale_y_continuous() etc
  • With that, you can, for example:

    1. Force the x-axis to be formatted as a date, adjusting how it is being displayed
    2. Force the y-axis to be formatted in terms of dollar amounts
  • In this way, you can impose meaningful structures in your chart depending on the type of data you are considering in your mapping to x and y axis!

More annotations, continued

  • For example, the code snippet below formats the x-axis to show breaks at the year level, and formats the y-axis in such a way that it goes from \(\small\$0\) to \(\small\$1,000\) by increments of \(\small\$50\)
  #Your previous ggplot call up to now
  {your_previous_ggplot} +
  #Changing the behavior of scales
  scale_x_date(date_breaks = '1 year',labels = year) +
  scale_y_continuous(labels = dollar, breaks = seq(from=0,to=1000,by=50))
  • Click here to see comprehensive list of all customizations that can be done across both x-axis and y-axis for continuous scales (scale_x_continuous() and scale_y_continuous())

  • Click here to see comprehensive list of all customizations that can be done across both x-axis and y-axis for date scales (scale_x_date() and scale_y_date())

Formatting scales

To properly format the appearance of your axis, make sure to have the scales package properly installed and loaded. You can do so by calling install.packages('scales') and library(scales).

Step 4: customize your axis

Listing 4: Using your previously created ggplot object, customize the appearance of the x-axis and y-axis in the following way: the x-axis shoudl be formatted as a date using an appropriate function that shows each year as a breakpoint, whereas the y-axis should be formatted in dollar terms, ranging from zero to one thousand dollars, by increments of 50, using an appropriate function. You can pass additional layers using the + operator. The FANG dataset and ggplot2 have been already loaded for you. Even if you submit the wrong answer, a live-tutoring feature will provide you with a handful of tips to adjust your code and resubmit your solution.

Use scale_x_date() with the appropriate arguments to format the x-axis, doing the same thing for the y-axis using scale_y_continuous():

#Let's use Meta (META) adjusted prices META=FANG%>%filter(symbol=='META') #Use ggplot2 to map the aesthetics to the plot ggplot(META, aes(x=date,y=adjusted,group=symbol)) + geom_line()+ #Adding a trend geom_smooth(method='loess')+ #Adding Annotations labs(title='META adjusted prices', subtitle = 'Source: Yahoo! Finance', x = 'Year', y = 'Adjusted Prices')+ #Changing the behavior of scales scale_x_date(date_breaks = '1 year',labels = year) + scale_y_continuous(labels = dollar, breaks = seq(from=0,to=1000,by=50))
#Let's use Meta (META) adjusted prices
META=FANG%>%filter(symbol=='META')

#Use ggplot2 to map the aesthetics to the plot
ggplot(META, aes(x=date,y=adjusted,group=symbol)) +
geom_line()+
#Adding a trend
geom_smooth(method='loess')+
#Adding Annotations
labs(title='META adjusted prices',
     subtitle = 'Source: Yahoo! Finance',
     x = 'Year',
     y = 'Adjusted Prices')+
#Changing the behavior of scales
scale_x_date(date_breaks = '1 year',labels = year) +
scale_y_continuous(labels = dollar, breaks = seq(from=0,to=1000,by=50))

Adding multiple data points

Question: what if we wanted to add more data?

  1. In our first example, we set filter(symbol)=='META' to select only information from Meta to your chart
  2. However, one might be interested in understanding how did Meta perform relative to its FANG peers
  • It is easy to do it with ggplot:
    1. Because you have set group=symbol, ggplot already knows that it needs to group by each different string contained in the ticker column
    2. In such a way, all you need to do is to add a new aes mapping, colour=symbol, so that ggplot knows that each symbol needs to have a different color!
  • In what follows, we will be charting all four FANG stocks in the same chart, adjusting the layers to try keeping aesthetics as good as possible

Adding multiple data points, practice

Facet it until you make it

  • We have included all FANG stocks into the same chart. Easy peasy, lemon squeezy!

  • As far as we could go on adjusting the layers, it seems that the chart conveys too much information:

    1. Because of the different scales, you can hardly tell the different between AMZN AND GOOG during 2015-2018
    2. Furthermore, trend lines are, in some cases, effectively hiding the data undernearth
  • Although you could easily remove the trend lines, ggplot2 also comes with a variety of alternatives when it comes to charting multiple data that may come in handy:

    1. You can facet your chart using facet_wrap, controlling the axis as well as the number of rows and columns
    2. You can grid your chart, making the comparison easier with fixed axes

Option 1: using facet_wrap()

Option 2: using facet_grid()

Adding themes: you’re in full control of your message!

  • By now, you are already looking like a data manipulation wizard in your firm:
    1. You have created a fully automated data ingestion process using tq_get() to get live FANG prices.
    2. Set up ggplot to automatically update the chart;
    3. Finally, you have adjusted all aesthetics to make it more much more professional
  • Plus: you haven’t even opened Excel! Well, what’s next?
  • A lot of the ggplot adoption throughout the R usiverse relates to themes: complete configurations which control all non-data display
    1. There are a lot of available themes that you can pass to your ggplot, like theme_minimal(), theme_bw()
    2. Alternatively, you can pass theme() if you just need to tweak the display of an existing theme

Adding themes() to your chart

The R community is on your side!

  • There are endless customizations that you could think of that could be applied to a theme

  • In special, the package ggthemes provides extra themes, geoms, and scales for ggplot2 that replicate the look of famous aesthetics that you have often looked and said: “how could I replicate that?”

  • To get access to these additional graphical resources in your R session, install and load the package using:

install.packages('ggthemes') #Install if not available
library(ggthemes) #Load

your_previous_ggplot_object + theme_{insertyourtheme}
  • To check all available themes, check the ggthemes library here website

Endless possibilities for theme customization

Further theme customization

  • Even with customized themes, you might still want to do your own customizations

  • It is easy to access each and every component of the chart by adding theme (using the + operator):

install.packages('ggthemes') #Install if not available
library(ggthemes) #Load

your_previous_ggplot_object +
  theme_{insertyourtheme}+
  theme(component_1 = configuration_1,
        component_2 = configuration_2,
        ...
        component_n = configuration_n)
  • In what follows, we will be using the theme() function to adjust some aspects of our chart, such as font size, angle, and text width, to make it look more professional

Doing custom theme() adjustments to the chart

Integrating tidyquant

  • Like in our previous lecture, tidyquant added very important functionalities for those who work in finance to easily manage financial time series using the well-established foundations of the tidyverse

  • When it comes to data visualization, tidyquant also provides a handful of integrations that can be inserted into your ggplot call:

    1. Possibility of using geom_barchart and geom_candlestick
    2. Moving average visualizations and Bollinger Bands available using geom_ma and geom_bbands
    3. A new theme, theme_tq, available

\(\rightarrow\) For a thorough discussion, see a detailed discussion on tidyquant’s charting capabilities here

Integrating tidyquant, continued

#Set up start and end dates
end=Sys.Date()
start=end-weeks(5)

FANG%>%
  #Make sure that date is read as a Date object
  mutate(date=as.Date(date))%>%
  #Filter
  filter(date >= start, date<=end)%>%
  #Basic layer - aesthetic mapping including fill
  ggplot(aes(x=date,y=close,group=symbol))+
  #Charting data - you could use geom_line(), geom_col(), geom_point(), and others
  geom_candlestick(aes(open = open, high = high, low = low, close = close))+
  geom_ma(ma_fun = SMA, n = 5, color = "black", size = 0.25)+
  #Facetting
  facet_wrap(symbol~.,scales='free_y')+
  #DeepSeek date
  geom_vline(xintercept=as.Date('2025-01-24'),linetype='dashed')+
  #Annotations
  labs(title='FANG adjusted prices before/after DeepSeek announcement',
       subtitle = 'Source: Yahoo! Finance',
       x = 'Date',
       y = 'Adjusted Prices')+
  #Scales
  scale_x_date(date_breaks = '3 days') +
  scale_y_continuous(labels = dollar) +
  #Custom 'The Economist' theme
  theme_economist()+
  #Adding further customizations
  theme(legend.position='none',
        axis.title.y = element_text(vjust=+4,face='bold'),
        axis.title.x = element_text(vjust=-3,face='bold'),
        plot.subtitle = element_text(size=8,vjust=-2,hjust=0,margin = margin(b=15)),
        axis.text.y = element_text(size=8),
        axis.text.x = element_text(angle=90,size=8))

Alternatives to ggplot2

  • ggplot2 is, by and large, the richest and most widely used plotting ecosystem in the language

  • However, there are also other interesting options, especially when it comes to interactive data visualization

    1. The plotly ecosystem provides interactive charts for R, Python, Julia, Java, among others - you can install the R package using install.packages('plotly')

    2. The Highcharts is another option whenever there is a need for interactive data visualization - you can install the R package using install.packages('highcharter')

  • In special, the highcharter package works seamlessly with time series data, especially those retrieved by the tidyquant’s tq_get() function

Using the highcharter package

#Install the highcharter package (if not installed yet)
#install.packages('highcharter')

#Load the highcharter package (if not loaded yet)
library(highcharter)

#Select the Google Stock with OHLC information and transform to an xts object
GOOG=tq_get('GOOG')%>%select(-symbol)%>%as.xts()

  #Initialize an empty highchart
  highchart(type='stock')%>%
  #Add the Google Series
  hc_add_series(GOOG,name='Google')%>%
  #Add title and subtitle
  hc_title(text='A Dynamic Visualization of Google Stock Prices Over Time')%>%
  hc_subtitle(text='Source: Yahoo! Finance')%>%
  #Customize the tooltip
  hc_tooltip(valueDecimals=2,valuePrefix='$')%>%
  #Convert it to a 'The Economist' theme
  hc_add_theme(hc_theme_economist())

I hope you are excited to what’s next!

Hands-On Exercise

  • In late January 2021, Reddit traders took on the short-sellers by forcing them to liquidate their short positions using GameStop stocks. This coordinated behavior had significant repercussions for various investment funds, such as Melvin Capital - see here and here

Exercise

  1. Use tq_get() to load information for GameStop (ticker: GME) and store it in a data.frame. Using the arguments from and to from tq_get(), filter for observations between occurring in between December 2020 (beginning of) and March 2021 (end of)
  2. Use ggplot(aes(x=date,group=symbol)), along with geom_candlestick() and its appropriate arguments, to chart the historical OHLC prices
  3. Create a vertical line annotation using geom_vline, setting the xintercept argument to the date of the Reddit frenzy (as.Date('2021-01-25'))
  4. Use the theme from The Economist calling theme_economist(). Make sure to have the ggthemes package installed and loaded
  5. Finally, call theme() and labs() to adjust the aesthetics of your theme and labels as you think it would best convey your message. For example, you can use the scales package to format the appearance of your x and y labels (for example, displaying a dollar sign in front of adjusted prices)

Hands-On Exercise, solutions

#Libraries
library(tidyquant)
library(tidyverse)
library(ggthemes)
library(scales)

#Setting start/end dates + reddit date
start='2020-12-01'
end='2021-03-31'
reddit_date=as.Date('2021-01-25')

#Get the data
tq_get('GME',from=start,to=end)%>%
  #Mapping
  ggplot(aes(x=date,group=symbol))+
  #Geom
  geom_candlestick(aes(open = open, high = high, low = low, close = close))+
  #Labels
  labs(x='',
       y='Adjusted Prices',
       title='GameStop (ticker: GME) prices during the reddit (Wall St. Bets) frenzy',
       subtitle='Source: Yahoo! Finance')+
  #Annotation
  geom_vline(xintercept=reddit_date,linetype='dashed')+
  annotate(geom='text',x=reddit_date-5,y=75,label='Reddit Frenzy Starts',angle=90)+
  #Scales
  scale_x_date(date_breaks = '2 weeks') +
  scale_y_continuous(labels = dollar) +
  #Custom 'The Economist' theme
  theme_economist()+
  #Adding further customizations
  theme(legend.position='none',
        axis.title.y = element_text(vjust=+4,face='bold'),
        axis.title.x = element_text(vjust=-3,face='bold'),
        plot.title = element_text(size=10),
        plot.subtitle = element_text(size=8,vjust=-2,hjust=0,margin = margin(b=15)),
        axis.text.y = element_text(size=8),
        axis.text.x = element_text(angle=45,size=8,vjust=0.75))

References

Scheuch, Christoph, Stefan Voigt, and Patrick Weiss. 2023. Tidy Finance with R. Chapman & Hall/CRC. https://www.tidy-finance.org/r/.
Wickham, Hadley, Mine Cetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.
Wilkinson, Leland. 2005. The Grammar of Graphics. 2nd ed. Statistics and Computing. New York, NY: Springer.